Definiteness Predictions for Japanese Noun Phrases
نویسنده
چکیده
One of the major problems when translating from Japanese into a European language such as German or English is to determine definiteness of noun phrases in order to choose the correct determiner in the target language. Even though in Japanese, noun phrase reference is said to depend in large parts on the discourse context, we show that in many cases there also exist linguistic markers for definiteness. We use these to build a rule hierarchy that predicts 79,5% of the articles with an accuracy of 98,9% from syntactic-semantic properties alone, yielding an efficient pre-processing tool for the computationally expensive context checking. 1 I n t r o d u c t i o n One of the major problems when translating from Japanese into a European language such as German or English is the insertion of articles. Both German and English distinguish between the definite and indefinite article, the former, in general, indicating some degree of familiarity with the referent, the latter referring to something new. Thus by using a definite article, the speaker expects the hearer to be able to identify the object he is talking about, whilst with the use of an indefinite article, a new referent is introduced into the discourse context (Heim, 1982). In contrast, the reference of Japanese noun phrases depends in large parts on the discourse " I would like to thank my colleagues Johan Bos, BjSrn Gambiick, Yoshiki Mori, Michael Paul, Manfred Pinkal, C.J. Rupp, Atsuko Shimada, Kristina Striegnitz and Karsten Worm for their valuable comments and support. This research was supported by the German Ministry of Education, Science, Research and Technology (BMBF) within the Verbmobil framework under grant no. 01 IV 701 R4. context, taking a previous mention of an object and all properties that can be inferred from it, as well as world knowledge as indicators for definite reference. Any noun phrase whose referent cannot be recovered from the discourse context will in turn be taken as indefinite. However, noun phrases can also be explicitly marked for definiteness, forcing an interpretation of the referent independent of the discourse context. In this way, it is possible to trigger accommodation of previously unknown specific referents, or to get an indefinite reading even if an object of the same type has already been introduced. For machine translation, it is important to find a systematic way of extracting the syntactic and semantic information responsible for marking the reference of noun phrases, in order to correctly choose the articles to be used in the target language. For this paper, we propose a rule hierarchy for this purpose, that can be used as a preprocessing tool to context checking. All noun phrases marked for definiteness in any way are assigned their referential property, leaving the others underspecified. After giving a short outline of related work in the next section, we will introduce our rule hierarchy in section 3. The resulting algorithm will be evaluated in section 4, and in section 5 we will address implementational issues. Finally, in section 6 we give a conclusion. 2 R e l a t e d W o r k The problem of article selection when translating from Japanese into any language requiring the use of articles has only been addressed systematically by a few authors. (Murata and Nagao, 1993) define a heuristic rule base for definiteness assignment, consisting of 86 weighted rules. These rules use surface in-
منابع مشابه
A Study of Inflectional Categories of Noun in Sistani Dialect
The present article aims to provide a synchronic study of the inflectional or morpho-syntactic categories of noun in Sistani dialect. These categories comprise person, number, gender or noun class, definiteness, case, and possession. Linguistic data was collected via recording free speech, and interviewing with 30 (15 females, 15 males) illiterate Sistani language consultants of age 40–102 year...
متن کاملDeterminers and Number in English contrasted with Japanese, as exemplified in Machine Translation
The fact that concepts are grammaticalized differently in different languages is a major problem for translation, especially for machine translation. Two major examples of this are syntactic number, and the use of (in)definite articles (a, some, the). In languages such as English, nouns are marked for number and the choice of article (or of no article) must be made for every noun phrase. In con...
متن کاملDefiniteness and Eventive Nominals
This paper discusses the interpretation of complex noun phrases. More specifically, it investigates the type of object that a noun phrase 1 denotes. It has been recognized for some time that noun phrases must be able to denote individuals in some circumstances, events in other cases, propositions in still others (Zucchi 1993, Peterson 1997). I will show that, on the one hand, there are various ...
متن کاملDefiniteness in the Hebrew Noun Phrase
This paper suggests an analysis of Modern Hebrew noun phrases in the framework of HPSG. It focuses on the peculiar properties of the definite article, including the requirement for definiteness agreement among various elements in the noun phrase, definiteness inheritance in constructstate nominals, the fact that the article does not combine with constructs and the similarities between construct...
متن کاملAn HPSG Account of Danish Pre-nominals
This article addresses the issue of selection restrictions for noun phrase specifiers. Danish data is presented which shows that definiteness plays an important role in this respect. It is pointed out that an analysis is required in which the specifier, when present, leaves a mark on the projected phrase. This is achieved by assuming that specifiers are syntactic heads of noun phrase constructi...
متن کامل